Let's investigate the average salary of four teams:
url <- "http://www.usatoday.com/sports/mlb/salaries/"
salaries <- read_csv("mlb_salaries.csv") %>%
select(NAME, TEAM, POS, SALARY)
salaries_trim <- salaries %>%
filter(TEAM %in% c("LAD", "NYY", "TB", "ARI"))
salaries_summary <- salaries_trim %>% group_by(TEAM) %>%
summarize(mean_sal = mean(SALARY), count_team = n())
grand_mean <- salaries %>% summarize(mean(SALARY))
The mean salary for ALL 120 players on those four teams is \[\mu = \$4,214,614\]
Remember, this is a summary value for the population. We hardly ever have the whole population to work with.
n <- 40 set.seed(20160129) mean_srs <- salaries %>% sample_n(n) %>% summarize(mean_srs_salary = mean(SALARY))
Suppose we select 40 players at random from the 120 total
\(\bar{x}_{SRS} = \$4,248,380\)
strat_n <- 10 mean_strat_by_team <- salaries %>% group_by(TEAM) %>% sample_n(strat_n) %>% summarize(mean_by_team = mean(SALARY)) mean_strat <- mean_strat_by_team %>% summarize(mean(mean_by_team))
Let's select 10 players from each of the 4 teams
\(\bar{x}_{STRAT} = \$3,998,016\)
SRS: Absolute bias of $33,766.27
Stratified: Absolute bias of $216,597.9
Control: Compare treatment of interest to a control group.
Randomization: Randomly assign subjects to treatments.
Replication: Within a study, replicate by collecting a sufficiently large sample. Or replicate the entire study.
Blocking: If there are variables that are known or suspected to affect the response variable, first group subjects into blocks based on these variables, and then randomize cases within each block to treatment groups.
Placebo: fake treatment, often used as the control group for medical studies
Placebo effect: experimental units showing improvement simply because they believe they are receiving a special treatment
Blinding: when experimental units do not know whether they are in the control or treatment group
Double-blind: when both the experimental units and the researchers do not know who is in the control and who is in the treatment group